Strong Scaling Study with PackageCompiler
Adam Lyon, Muon g-2 IRMA Analysis, Fermilab @ Home, November 2020
This notebook examines strong scaling properties of Julia IRMA jobs making just one plot. See the strongScaling.jl notebook for background. For this notebook, the PackageCompiler.jl was run to compile functions needed by the job. Doing so improved the Julia startup time significantly.
This notebook answers issue Analyze results from Strong Scaling jobs #18 and code may be found in PR #20. This file is IRMA/analyses/018_StrongScaling/StrongScaling_pkgc.jl.
What is this notebook?
This is a Pluto.jl notebook and the code here is written in Julia. This is like a Jupyter notebook, but with important differences. The most important difference is that the results appear above the code. Another important difference is that Pluto.jl notebooks are reactive. This means that unlike Jupyter notebooks, Pluto.jl notebooks are always in a consistent state. The notebook keeps track of the cell-to-cell dependencies and when a cell changes, the dependent cells update at the same time. This means that while you are looking at a static html representation of the notebook, you can be assured that the notebook is consistent and up-to-date. You'll see that some results have a little triangle next to them. Clicking on that will open an expanded view of the results.
The organization of this notebook is that the main results are replicated at the top, with discussion, in the Results section. The plots are stored in variables which you can see below the plot. You can look in the Code section, which has all code for this notebook, to see how the plot was made.
Introduction
This notebook examines strong scaling properties of my Julia IRMA jobs that make one plot of energy of clusters. Nearly all of the runs used the maximum 32 physical cores on each node (tasks). I ran with 2, 3, 4, 5, 6, 8, 10, 12, 15, and 20 nodes. To look for conflicts between tasks on nodes, I ran 5 nodes with 2, 4, 8, 16, and 32 tasks.
On 11/2, I ran IRMA/jobs/003_StrongScaling/strongScalingJob.jl that was compiled by PackageCompiler.jl from commit 823ce57 of master (PR #22). See this comment in issue #3.
I ran over era 2D data, which amounted to approximately 16B rows. The total HDF5 file size is 250 GB.
PackageCompiler.jl (see documentation) allows for precompilation of functions that are actually used by the Julia script. A shared object (.so) file is loaded at Julia start time. Using PackageCompiler.jl significantly improves the time it takes to load packages and functions do not need to be JIT compiled, speeding up the code. In MPI environment, using PackageCompiler.jl is ideal to avoid each rank doing its own JIT compilation. The downside of PackageCompiler.jl is that the compiled code in the shared object file is fixed, and so if you update packages to new versions, you may have a mistmatch. But given that the environment is pretty carefully controlled, this isn't much of a problem.
This notebook is an update to a previous analysis Pluto notebook at IRMA/analyses/018_StrongScaling/StrongScaling.jl. That analysis did not use PackageCompiler.jl and there seemed to be significant time loading packages and compiling. This notebook using PackageCompiler.jl shows signficant stability and improvement in timings.
There are occaisonal anomalies. I did a 10 node job that took an extremely long time, running it again produced more normal looking timings.
Results
There are several types of results.
Histogram comparison
This was done in the previous notebook without PackageCompiler.jl. I did not re-run this analysis.
MPI Timing information
I record the time in the job with an IRMA Stopwatch. The stopwatch uses MPI.Wtime under the hood. The times are recorded as follows.
| Label | Meaning |
|---|---|
| start | After packages are loaded, functions are defined, and MPI.Init call |
| openedFile | After the h5open statement |
| openedDataSet | After the energy dataset is opened (but no data read yet) |
| determineRanges | After the ranges to examine are determined with partitionDS |
| readDataSet | After the dataset is read (this reads the actual data for the rank) |
| makeHistogram | After the data is histogrammed |
| gatheredHistograms | After all the histograms have been gathered to the root rank |
| reducedHistograms | After the histograms have been reduced to the root rank |
| gatherRankLogs | After the rank's log info has been gathered to the root rank |
Timing of anything before start is not recorded.
Let's look at the timing plots. Note that you can see all the timing plots in the Code section. Some representative plots will be reproduced here.
Here is a run using three nodes (and 32 ranks per node).
Each rank reads about 175,375,305 rows (or one more). These timings are slightly better than without PackageCompiler.jl (package loading is not shown here).
And here's ten nodes...
Again, we see that some nodes are faster than others for reading.
Here's a box and whisker plot of total MPI time...
xxxxxxxxxxstrongScalingPlotLet's just look at the reading time and see if that scales appropriately.
xxxxxxxxxxdataSetReadTimePlotThere's a bit of variation. Look at the plots in the Code section to see why.
Let's now concentrate on seeing if the reading time scales. We'll take the mean read time for each run...
xxxxxxxxxxreadTimePlotThe expected time is anchored at the 2 node run. So a four node run should be twice as fast if the scaling were perfect. And what we see is pretty close.
10 rows × 4 columns
| numNodes | readDataSet_maximum | readDataSet_mean | expectedFromMean | |
|---|---|---|---|---|
| Int64 | Float64 | Float64 | Float64 | |
| 1 | 2 | 36.7791 | 34.7144 | 34.7144 |
| 2 | 3 | 25.3889 | 23.6609 | 23.1429 |
| 3 | 4 | 18.845 | 17.7989 | 17.3572 |
| 4 | 5 | 15.5825 | 14.2538 | 13.8858 |
| 5 | 6 | 12.8665 | 11.9569 | 11.5715 |
| 6 | 8 | 9.9717 | 9.24282 | 8.6786 |
| 7 | 10 | 9.3017 | 7.8274 | 6.94288 |
| 8 | 12 | 8.17051 | 6.70655 | 5.78573 |
| 9 | 15 | 6.33904 | 5.63782 | 4.62859 |
| 10 | 20 | 6.51274 | 5.26359 | 3.47144 |
We can also look at reducing the number of ranks per node to see if there's any significant i/o contention. Here is the read time for 5 nodes but varying the number of ranks per node.
xxxxxxxxxxmax5ReadTimesPlotLet's calculate the fraction of time in contention; that is the difference between the measured read time and the expected time divided by the measured read time.
xxxxxxxxxxmax5ReadTimesDiffPlotSo the above plot suggests that fewer ranks per node is more efficient. When we ask for 32 nodes, we are possibly wasting about 30% of the read time in i/o contention. Here's a table,...
5 rows × 6 columns
| numTasks | readDataSet_maximum | readDataSet_mean | expectedFromMean | diff | diffPerc | |
|---|---|---|---|---|---|---|
| Int64 | Float64 | Float64 | Float64 | Float64 | Float64 | |
| 1 | 2 | 161.71 | 161.093 | 161.093 | 0.0 | 0.0 |
| 2 | 4 | 83.0948 | 82.5291 | 80.5466 | 1.98251 | 2.4022 |
| 3 | 8 | 46.1693 | 45.4499 | 40.2733 | 5.17663 | 11.3897 |
| 4 | 16 | 27.7527 | 25.9477 | 20.1367 | 5.81102 | 22.3951 |
| 5 | 32 | 15.5825 | 14.2538 | 10.0683 | 4.18546 | 29.3638 |
Remember, there are always 5 nodes in the job.
Accounting Information
We can extract accounting information from the SLURM batch system to see how long the jobs took.
The total batch time is the total wall clock time for the job. The total Julia time is how much time the srun julia ... command took within the batch script. The MPI Julia time is the amount of time that was recorded within MPI. Note that the latter does not include the time it took to start Julia and load packages as well as the time to write the results to disk.
The difference between the total batch and julia times are likely due to setting up the requested number of nodes. This time appears to be significant when requesting a large number of nodes. The total Julia time shape is puzzling, though it appears it is most efficient around six or seven nodes. Not clear why it increases and why it appears to turn over before twenty nodes.
Conclusions
These results are improved and more stable by using PackageCompiler.jl. Things that I conclude from this study.
Use
PackageCompiler.jl32 tasks per node seems to involve significant i/o contention
Configuring a large number of nodes takes a signifcant amount of time.
Code
xxxxxxxxxxmd"""## Code"""xxxxxxxxxx# Wide screenhtml"""<style>main { max-width: 1100px;}"""xxxxxxxxxx# Activate the environmentbegin import Pkg Pkg.activate(".") using Reviseendxxxxxxxxxxusing IRMA, JLD2, FileIO, Glob, Pipe"/Users//lyon/Development/gm2/data/003_StrongScaling_pkgc/"xxxxxxxxxxconst datapath = "/Users//lyon/Development/gm2/data/003_StrongScaling_pkgc/""histos_10x32.jld2"
"histos_12x32.jld2"
"histos_15x32.jld2"
"histos_20x32.jld2"
"histos_2x32.jld2"
"histos_3x32.jld2"
"histos_4x32.jld2"
"histos_5x32.jld2"
"histos_6x32.jld2"
"histos_8x32.jld2"
xxxxxxxxxxhistoFiles32 = glob("histos_*32.jld2", datapath) |> basename.(_)Timing information
xxxxxxxxxxusing DataFramesextractNNodesFromFileName (generic function with 1 method)xxxxxxxxxx# Extract number of nodesfunction extractNNodesFromFileName(fileName::String) m = match(r"histos_(\d+)x32", fileName) m.captures[1] |> parse(Int, _)end10
12
15
20
2
3
4
5
6
8
xxxxxxxxxxextractNNodesFromFileName.(histoFiles32)dataFrameFromRankData (generic function with 1 method)xxxxxxxxxx# Read histos_nx32.jld2 file and return a dataframefunction dataFrameFromRankData(fileName::String, extractFcn) numNodes = extractFcn(fileName) data = load(joinpath(datapath, fileName)) # Load the JLD2 file rt = rankTimings(data["allTimings"]) # Extract the rank timings rl = data["allRankLogs"] # Get the log info # Number of rows processed numRanks = length(rl) # How many ranks? # Construct the DataFrame by columns df = DataFrame(numNodes=fill(numNodes, numRanks), rank=0:numRanks-1, numRows=[r.len for r in rl]) df = hcat(df, DataFrame(rt)) # Convert the named tuple of timings to DataFrame df = hcat(df, DataFrame(totalTime=rankTotalTime(data["allTimings"]))) return dfendxxxxxxxxxxusing PlutoDataTablexxxxxxxxxx# Make dataframe from all of the filesdf = let df = vcat( dataFrameFromRankData.(histoFiles32, extractNNodesFromFileName)...); sort!(df) dfend;xxxxxxxxxxgdf = groupby(df, :numNodes);2
3
4
5
6
8
10
12
15
20
xxxxxxxxxxtheNumNodes = [ k[1] for k in keys(gdf) ]xxxxxxxxxxbegin using Plots using StatsPlots gr()endplotsForRun (generic function with 1 method)xxxxxxxxxx# Make plots for a groupfunction plotsForRun(df) cols = 3:ncol(df) # Don't plot numNodes and rank columns p = [] for i in cols yaxis = i==3 ? "# rows read" : "seconds" push!(p, scatter(df.rank, df[i], legend=nothing, title=names(df)[i], xaxis="Rank", yaxis=yaxis, xticks=0:32:20*32, titlefontsize=11, xguidefontsize=8, markersize=2)) end pendxxxxxxxxxxusing PlutoUIxxxxxxxxxx e Slider(1:length(gdf))Plots for run with 15 nodes (32 ranks per node)
xxxxxxxxxxmd"""### Plots for run with $(theNumNodes[e]) nodes (32 ranks per node)"""xxxxxxxxxxplot(plotsForRun(gdf[e])..., size=(1000,700), layout=(5,2))xxxxxxxxxxstrongScalingPlot = df boxplot(:numNodes, :totalTime, legend=nothing, title="Strong Scaling Study (one plot)", xaxis="Number of nodes", yaxis="Total time(s)", size=(800,600))10 rows × 2 columns
| numNodes | totalTime_maximum | |
|---|---|---|
| Int64 | Float64 | |
| 1 | 2 | 43.576 |
| 2 | 3 | 30.7742 |
| 3 | 4 | 23.5602 |
| 4 | 5 | 19.8548 |
| 5 | 6 | 16.8802 |
| 6 | 8 | 13.6648 |
| 7 | 10 | 12.8117 |
| 8 | 12 | 11.6279 |
| 9 | 15 | 9.69708 |
| 10 | 20 | 9.58383 |
xxxxxxxxxxtotalMPITimes = combine(gdf, :totalTime => maximum)xxxxxxxxxxmaxMPITimesPlot = totalMPITimes scatter(:numNodes, :totalTime_maximum, legend=nothing, xaxis="Number of nodes", yaxis="Maximum total time", ylim=(0, 50))Let's look at the read time scaling. Determine the average read time
xxxxxxxxxxdataSetReadTimePlot = df boxplot(:numNodes, :readDataSet, legend=nothing, title="Dataset read time", xaxis="Number of nodes", yaxis="read time (s)", size=(800,600))xxxxxxxxxxusing Statistics10 rows × 4 columns
| numNodes | readDataSet_maximum | readDataSet_mean | expectedFromMean | |
|---|---|---|---|---|
| Int64 | Float64 | Float64 | Float64 | |
| 1 | 2 | 36.7791 | 34.7144 | 34.7144 |
| 2 | 3 | 25.3889 | 23.6609 | 23.1429 |
| 3 | 4 | 18.845 | 17.7989 | 17.3572 |
| 4 | 5 | 15.5825 | 14.2538 | 13.8858 |
| 5 | 6 | 12.8665 | 11.9569 | 11.5715 |
| 6 | 8 | 9.9717 | 9.24282 | 8.6786 |
| 7 | 10 | 9.3017 | 7.8274 | 6.94288 |
| 8 | 12 | 8.17051 | 6.70655 | 5.78573 |
| 9 | 15 | 6.33904 | 5.63782 | 4.62859 |
| 10 | 20 | 6.51274 | 5.26359 | 3.47144 |
xxxxxxxxxxbegin maxMeanReadTimes = combine(gdf, :readDataSet => maximum, :readDataSet => mean) transform!(maxMeanReadTimes, [:numNodes, :readDataSet_mean] => ( (n, t) -> t[1] ./ (n ./ n[1]) ) => :expectedFromMean)endxxxxxxxxxxreadTimePlot = maxMeanReadTimes scatter(:numNodes, [:readDataSet_mean :expectedFromMean], xaxis="Number of nodes", yaxis="Read time (s)")So there does seem to be a litle bit of contention for large number of nodes).
Examine Fewer tasks per node
"histos_5x16.jld2"
"histos_5x2.jld2"
"histos_5x32.jld2"
"histos_5x4.jld2"
"histos_5x8.jld2"
xxxxxxxxxxconst histoFiles5 = glob("histos_5x*.jld2", datapath) |> basename.(_)extractNTasksFromFileName (generic function with 1 method)xxxxxxxxxx# Extract number of nodesfunction extractNTasksFromFileName(fileName::String) m = match(r"histos_5x(\d+)", fileName) m.captures[1] |> parse(Int, _)end16
2
32
4
8
xxxxxxxxxxextractNTasksFromFileName.(histoFiles5)xxxxxxxxxx# Make dataframe from all of the filesdf5 = let df = vcat( dataFrameFromRankData.(histoFiles5, extractNTasksFromFileName)...); rename!(df, :numNodes => :numTasks) sort!(df) dfend;xxxxxxxxxxgdf5 = groupby(df5, :numTasks);xxxxxxxxxxstrongScalingPlot5 = df5 boxplot(:numTasks, :totalTime, legend=nothing, title="Strong Scaling Study with 5 nodes (one plot)", xaxis="Number of tasks", yaxis="Total time (s)", size=(800,600))5 rows × 2 columns
| numTasks | totalTime_maximum | |
|---|---|---|
| Int64 | Float64 | |
| 1 | 2 | 185.3 |
| 2 | 4 | 96.1674 |
| 3 | 8 | 54.2057 |
| 4 | 16 | 46.0226 |
| 5 | 32 | 19.8548 |
xxxxxxxxxxtotal5MPITimes = combine(gdf5, :totalTime => maximum)5 rows × 3 columns
| numTasks | totalTime_maximum | expected | |
|---|---|---|---|
| Int64 | Float64 | Float64 | |
| 1 | 2 | 185.3 | 185.3 |
| 2 | 4 | 96.1674 | 92.65 |
| 3 | 8 | 54.2057 | 46.325 |
| 4 | 16 | 46.0226 | 23.1625 |
| 5 | 32 | 19.8548 | 11.5812 |
xxxxxxxxxxtransform!(total5MPITimes, [:numTasks, :totalTime_maximum] => ( (n, t) -> t[1] ./ (n./n[1])) => :expected)xxxxxxxxxxmax5MPITimesPlot = total5MPITimes scatter(:numTasks, [:totalTime_maximum :expected], xaxis="Number of tasks", yaxis="Maximum total time", ylim=(0, 200))Let's look at the read times...
5 rows × 6 columns
| numTasks | readDataSet_maximum | readDataSet_mean | expectedFromMean | diff | diffPerc | |
|---|---|---|---|---|---|---|
| Int64 | Float64 | Float64 | Float64 | Float64 | Float64 | |
| 1 | 2 | 161.71 | 161.093 | 161.093 | 0.0 | 0.0 |
| 2 | 4 | 83.0948 | 82.5291 | 80.5466 | 1.98251 | 2.4022 |
| 3 | 8 | 46.1693 | 45.4499 | 40.2733 | 5.17663 | 11.3897 |
| 4 | 16 | 27.7527 | 25.9477 | 20.1367 | 5.81102 | 22.3951 |
| 5 | 32 | 15.5825 | 14.2538 | 10.0683 | 4.18546 | 29.3638 |
xxxxxxxxxxbegin maxMean5ReadTimes = combine(gdf5, :readDataSet => maximum, :readDataSet => mean) transform!(maxMean5ReadTimes, [:numTasks, :readDataSet_mean] => ( (n, t) -> t[1] ./ (n ./ n[1]) ) => :expectedFromMean) transform!(maxMean5ReadTimes, [:readDataSet_mean, :expectedFromMean] => (-) => :diff ) transform!(maxMean5ReadTimes, [:diff, :readDataSet_mean] => ( (d, r) -> d./r * 100) => :diffPerc )endx
max5ReadTimesPlot = maxMean5ReadTimes scatter(:numTasks, [:readDataSet_mean :expectedFromMean], xaxis="Number of tasks", yaxis="Read time time (s)")This actually looks nice - very efficient.
Trying it as a log plot...
xxxxxxxxxxmax5ReadTimesLogPlot = maxMean5ReadTimes scatter(:numTasks, [:readDataSet_mean :expectedFromMean], xaxis="Number of tasks", yaxis="Read time time (s)", yscale=:log10, yticks=(1:15:180, 1:15:180))Plot the difference...
x
max5ReadTimesDiffPlot = maxMean5ReadTimes scatter(:numTasks, :diffPerc, legend=nothing, xaxis="Number of tasks", yaxis="Percentage of time in contention (%)")Examine accounting information
xxxxxxxxxxmd"""## Examine accounting information"""xxxxxxxxxxusing CSV60 rows × 25 columns (omitted printing of 19 columns)
| JobID | JobName | QOS | State | ExitCode | NNodes | |
|---|---|---|---|---|---|---|
| String | String | String? | String | Time… | Int64 | |
| 1 | 35817597 | run_strongScalingJob_compiled.sh | debug_hsw | COMPLETED | 00:00:00 | 2 |
| 2 | 35817597.batch | batch | missing | COMPLETED | 00:00:00 | 1 |
| 3 | 35817597.extern | extern | missing | COMPLETED | 00:00:00 | 2 |
| 4 | 35817597.0 | julia | missing | COMPLETED | 00:00:00 | 2 |
| 5 | 35817702 | run_strongScalingJob_compiled.sh | debug_hsw | COMPLETED | 00:00:00 | 3 |
| 6 | 35817702.batch | batch | missing | COMPLETED | 00:00:00 | 1 |
| 7 | 35817702.extern | extern | missing | COMPLETED | 00:00:00 | 3 |
| 8 | 35817702.0 | julia | missing | COMPLETED | 00:00:00 | 3 |
| 9 | 35817713 | run_strongScalingJob_compiled.sh | debug_hsw | COMPLETED | 00:00:00 | 4 |
| 10 | 35817713.batch | batch | missing | COMPLETED | 00:00:00 | 1 |
| 11 | 35817713.extern | extern | missing | COMPLETED | 00:00:00 | 4 |
| 12 | 35817713.0 | julia | missing | COMPLETED | 00:00:00 | 4 |
| 13 | 35817716 | run_strongScalingJob_compiled.sh | debug_hsw | COMPLETED | 00:00:00 | 5 |
| 14 | 35817716.batch | batch | missing | COMPLETED | 00:00:00 | 1 |
| 15 | 35817716.extern | extern | missing | COMPLETED | 00:00:00 | 5 |
| 16 | 35817716.0 | julia | missing | COMPLETED | 00:00:00 | 5 |
| 17 | 35817738 | run_strongScalingJob_compiled.sh | debug_hsw | COMPLETED | 00:00:00 | 6 |
| 18 | 35817738.batch | batch | missing | COMPLETED | 00:00:00 | 1 |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
xxxxxxxxxxsacct = CSV.File(joinpath(datapath, "003_pkgc.csv")) |> DataFrame"slurm-35817597.out"
"slurm-35817702.out"
"slurm-35817713.out"
"slurm-35817716.out"
"slurm-35817738.out"
"slurm-35817773.out"
"slurm-35817861.out"
"slurm-35818006.out"
"slurm-35818160.out"
"slurm-35818272.out"
"slurm-35818396.out"
"slurm-35818594.out"
"slurm-35818775.out"
"slurm-35821168.out"
xxxxxxxxxxslurmLogFiles = glob("slurm-*.out", datapath) |> basename.(_)jobIdFromSlurmLogName (generic function with 1 method)xxxxxxxxxxfunction jobIdFromSlurmLogName(fn) m = match(r"slurm-(\d+)", fn) m.captures[1]end"35817597"
"35817702"
"35817713"
"35817716"
"35817738"
"35817773"
"35817861"
"35818006"
"35818160"
"35818272"
"35818396"
"35818594"
"35818775"
"35821168"
xxxxxxxxxxslurmIds = jobIdFromSlurmLogName.(slurmLogFiles)selectDesiredJobIds (generic function with 1 method)xxxxxxxxxx# Select out the jobIds that we care aboutfunction selectDesiredJobIds(jobIds) sacctM = filter(:JobID => j -> occursin.(jobIds, j) |> any, sacct) # And don't care about the batch or extern jobs (not sure what they are) filter!(:JobName => jn -> jn != "batch" && jn != "extern", sacctM) sacctMend28 rows × 25 columns (omitted printing of 18 columns)
| JobID | JobName | QOS | State | ExitCode | NNodes | NCPUS | |
|---|---|---|---|---|---|---|---|
| String | String | String? | String | Time… | Int64 | Int64 | |
| 1 | 35817597 | run_strongScalingJob_compiled.sh | debug_hsw | COMPLETED | 00:00:00 | 2 | 128 |
| 2 | 35817597.0 | julia | missing | COMPLETED | 00:00:00 | 2 | 64 |
| 3 | 35817702 | run_strongScalingJob_compiled.sh | debug_hsw | COMPLETED | 00:00:00 | 3 | 192 |
| 4 | 35817702.0 | julia | missing | COMPLETED | 00:00:00 | 3 | 96 |
| 5 | 35817713 | run_strongScalingJob_compiled.sh | debug_hsw | COMPLETED | 00:00:00 | 4 | 256 |
| 6 | 35817713.0 | julia | missing | COMPLETED | 00:00:00 | 4 | 128 |
| 7 | 35817716 | run_strongScalingJob_compiled.sh | debug_hsw | COMPLETED | 00:00:00 | 5 | 320 |
| 8 | 35817716.0 | julia | missing | COMPLETED | 00:00:00 | 5 | 160 |
| 9 | 35817738 | run_strongScalingJob_compiled.sh | debug_hsw | COMPLETED | 00:00:00 | 6 | 384 |
| 10 | 35817738.0 | julia | missing | COMPLETED | 00:00:00 | 6 | 192 |
| 11 | 35817773 | run_strongScalingJob_compiled.sh | debug_hsw | COMPLETED | 00:00:00 | 8 | 512 |
| 12 | 35817773.0 | julia | missing | COMPLETED | 00:00:00 | 8 | 256 |
| 13 | 35817861 | run_strongScalingJob_compiled.sh | debug_hsw | COMPLETED | 00:00:00 | 12 | 768 |
| 14 | 35817861.0 | julia | missing | COMPLETED | 00:00:00 | 12 | 384 |
| 15 | 35818006 | run_strongScalingJob_compiled.sh | debug_hsw | COMPLETED | 00:00:00 | 15 | 960 |
| 16 | 35818006.0 | julia | missing | COMPLETED | 00:00:00 | 15 | 480 |
| 17 | 35818160 | run_strongScalingJob_compiled.sh | debug_hsw | COMPLETED | 00:00:00 | 20 | 1280 |
| 18 | 35818160.0 | julia | missing | COMPLETED | 00:00:00 | 20 | 640 |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
xxxxxxxxxxsacctM = selectDesiredJobIds(slurmIds)splitIntoJuliaAndTotal (generic function with 1 method)xxxxxxxxxx# Split the table into Julia and total infofunction splitIntoJuliaAndTotal(df) filter!([:NNodes, :JobID] => (n, j) -> n != 5 || occursin("35817716", j), df) # Remove 5 x less than 32 tasks totalInfo = filter(:JobName => jn -> jn != "julia", df) juliaInfo = filter(:JobName => jn -> jn == "julia", df) return totalInfo, juliaInfoendxxxxxxxxxxtotalInfo, juliaInfo = splitIntoJuliaAndTotal(sacctM);xxxxxxxxxxrunTimesPlot = begin totalInfo scatter(:NNodes, :ElapsedRaw, label="Total batch time", xaxis="Number of nodes", yaxis="Time (s)") juliaInfo scatter!(:NNodes, :ElapsedRaw, ylim=(0, 80), label="total julia time") totalMPITimes scatter!(:numNodes, :totalTime_maximum, label="MPI julia time")end